Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Prawin R P, Pranav R P, Swathi R
DOI Link: https://doi.org/10.22214/ijraset.2023.54343
Certificate: View Certificate
Renal failure is characterized by progressive kidney function loss over time. It is a serious medical condition that affects millions of people worldwide. It is caused by the inability of the kidneys to properly filter waste and excess fluids from the blood. Renal failure can be a consequence of chronic kidney disease. Chronic kidney disease is a long-term condition that causes the kidneys to gradually lose function over time. If chronic kidney disease is not adequately managed, the kidney’s function may continue to decline, leading to renal failure. It is essential to monitor and manage chronic kidney disease to prevent renal failure from developing. This research paper presents an approach for predicting renal failure using several machine-learning classification techniques. The study evaluates the performance of various classifiers such as Decision Tree, Naive Bayes, Extreme Gradient Boosting, Logistic Regression, and Support Vector Machines using various evaluation metrics. The performance of these classifiers is evaluated using various metrics such as accuracy, precision, recall, and F1-score. This proposed method can be useful for early diagnosis and treatment of renal failure, thus reducing the complications and costs associated with the disease. By comparing and evaluating the performance of these models, we aim to identify the most effective approach for predicting renal failure and provide valuable insights for clinical practice.
I. INTRODUCTION
The human body has two kidneys located at the back of the peritoneal cavity, which are vital organs necessary for its proper functioning. The main function of the kidneys is to regulate the balance of salt, water, and other ions and trace Elements in the human body, such as calcium, phosphorus, magnesium, potassium, chlorine, and acids. Data mining is the computer-based interaction of extricating helpful data from gigantic arrangements of data sets. [2] Data mining is generally useful in an explorative investigation on the grounds of insightful data from enormous Volumes of proof. Clinical information-digging extraordinary potential for investigating the secretive examples in The enlightening files of clinical space. Such data should be assembled in a synchronized form.
This gathered data can be then used to shape a clinical information system. [5] Data mining gives a customer-arranged approach to managing narrative and concealed plans in the data. [7] In this research paper, we present an approach for predicting renal failure using several machine-learning classification-based modeling approaches. This paper dissects the renal failure expectations utilizing arrangement calculations. [13] The study evaluates the performance of various classifiers such as Decision Tree, Naive Bayes, Extreme Gradient Boosting, Logistic Regression, and Support Vector Machines using various performance evaluation metrics. This proposed method can be useful for early diagnosis and treatment of renal failure. By contrasting and analyzing the performance of these different models, we determine the best approach for predicting renal failure and provide insightful information for clinical practice.
II. LITERATURE REVIEW
An Gunarathne W.H.S.D et al. [1] compared different machine learning models and found that the Multiclass Decision Forest algorithm had the highest accuracy of approximately 99% on a reduced dataset with 14 attributes. However, it is important to note that a model's accuracy may depend on various factors and the findings may not generalize to other datasets or contexts.
Salekin and Stankovic [3] utilized a novel machine-learning approach to detect Chronic Kidney Disease in a dataset of 400 records and 25 attributes. Their study employed k-nearest neighbors, random forest, and neural network algorithms, along with a wrapper method for feature reduction, resulting in high accuracy in detecting Chronic Kidney Disease. The results indicated that the Chronic Kidney Disease detection accuracy was high. By utilizing this approach, the researchers demonstrated the potential for machine learning to improve Chronic Kidney Disease diagnosis and treatment.
Pinar Yildirim [4] investigated the impact of class imbalance on neural network algorithms for making medical decisions about Chronic Kidney Disease. The comparative study conducted using sampling algorithms demonstrated that their use can enhance the performance of classification algorithms. Moreover, the research highlighted the critical role of the learning rate in multilayer perceptron, significantly affecting its performance.
Guneet Kaur and Ajay Sharma [6] proposed a system for predicting Chronic Kidney Disease using Data Mining Algorithms in Hadoop. The study utilized two classifiers, KNN and SVM, and manually selected data columns for predictive analysis. The results indicated that SVM classifier outperformed KNN in accuracy, demonstrating the potential of this approach for Chronic Kidney Disease prediction.
Vasquez-Morales et al. [8] created a neural network model for predicting the risk of developing Chronic Kidney Disease, using a dataset of 40,000 instances. The accuracy of their model was reported as 95%.
Chen et al. [9] evaluated the performance of KNN, SVM, and SIMCA (Soft Independent Modelling of class Analogy) models for predicting the risk of Chronic Kidney Disease using a dataset from UCI. The SVM and KNN models achieved the highest accuracy of 99.7%, and SVM was found to be the most robust against noise disturbance.
Padmanaban and Parthiban [10] proposed the use of machine learning classifiers for early detection of Chronic Kidney Disease in diabetic patients. They collected data from a diabetes research center in Chennai and evaluated the performance of Naive Bayes and Decision tree algorithms using the Weka tool. Their study found that Naive Bayes classifier had the highest accuracy of 91%.
De Almeida et al. [11] conducted a study using Decision tree, Random Forest, and Support Vector Machine with various functions on the MIMIC-II database to predict Chronic Kidney Disease. They found that Decision tree and Random Forest had the highest accuracy, with prediction accuracies of 87% and 80%, respectively.
Deepika et al. [12] developed a Chronic Kidney Disease prediction project on a 24-attribute dataset using KNN and Naïve Bayes machine learning algorithms. The KNN algorithm achieved an accuracy of 97%, while the Naïve Bayes algorithm achieved an accuracy of 91%.
S. R. Raghavan, V. Ladik, and K. B. Meyer [14] suggested a decision support system called DARWIN, which is an intelligent software tool that assists doctors in determining the appropriate erythropoietin dosage for Chronic Kidney Disease patients. This system makes it simpler for doctors to calculate the dosage for thousands of patients within a month, which is a challenging task in the management of chronic kidney disease.
III. METHODOLOGY
A. Dataset
The dataset used here is taken from the UCI Machine Learning archive. UCI Is a collection of informational indexes that are used for complete AI estimations. The dataset used here is the certifiable dataset. The collection contains four-hundred events of data with the legitimate twenty-five clinical limits. The clinical limit of the dataset is about tests that are taken related to kidney ailment as diabetes mellitus, hypertension, coronary artery disease, anemia, red blood cell count, white blood cell count, etc.
Table 1. List of Attributes in the Dataset
Attributes |
Type |
Age |
Numeric |
Blood Pressure |
Numeric |
Specific Gravity |
Numeric |
Albumin |
Numeric |
Sugar |
Numeric |
Red Blood Cells |
Nominal |
Pus Cell |
Nominal |
Pus Cell Clumps |
Nominal |
Bacteria |
Nominal |
Blood Glucose Random |
Numeric |
Blood Urea |
Numeric |
Serum Creatinine |
Numeric |
Sodium |
Numeric |
Potassium |
Numeric |
Hemoglobin |
Numeric |
Packed Cell Volume |
Numeric |
Red Blood Cell Count |
Numeric |
White Blood Cell Count |
Numeric |
Hypertension |
Nominal |
Diabetes Mellitus |
Nominal |
Coronary Artery Disease |
Nominal |
Appetite |
Nominal |
Pedal Edema |
Nominal |
Anemia |
Nominal |
Class |
Class |
B. Architecture Diagram
C. Pre-Processing
The preprocessing main objective is to transform raw data into a format that can be easily used by machine learning algorithms. Using various techniques and methods, such as data cleaning, Handling Missing Values, and Outlier Detection, preprocessing can help to maximize the accuracy and effectiveness of machine learning algorithms. This allows the algorithms to identify patterns and relationships within the data, leading to more accurate and meaningful predictions and insights.
D. Classification Models
IV. RESULTS AND DISCUSSIONS
A. Performance Evaluation
The prediction model shall be evaluated to ensure that the model fits the dataset and work well on unseen data. The aim of the performance evaluation is to estimate the generalization accuracy of a model on unseen/out-of-sample data. Different performance evaluation metrics including accuracy, precision, recall, and f1-score have been computed. The confusion matrix helps us with this by describing the performance of the classifier. True Positive (TP) means a prediction made by a model that falls under the positive class and the instance actually falls under the positive class. True Negative (TN) means a prediction made by a model that falls under the negative class and the instance actually falls under the negative class. False Positive (FP) means a prediction made by a model that falls under the positive class but the instance actually falls under the negative class. False Negative (FN) means a prediction made by a model that falls under the negative class but the instance actually falls under the positive class. The above four measures [16] mentioned are used to evaluate the performance of several binary classification models and provide a more comprehensive understanding of their accuracy and reliability.
V. FUTURE ENHANCEMENTS
Incorporating the proposed method into clinical practice and evaluating its impact on patient outcomes and healthcare costs in a real-world setting. Incorporating real-time monitoring data from wearable devices to improve the early detection and diagnosis of renal failure. Utilizing the proposed method to predict renal failure in different populations and cultures to increase the generalizability of the findings. Developing a web-based or mobile application to make the proposed method more accessible to patients and healthcare providers. Combining the proposed method with other biomarkers to improve the early diagnosis of renal failure. Incorporating more advanced feature selection methods to improve the interpretability of the models.
In conclusion, this research proposed a machine learning-based approach for predicting renal failure using several classification techniques. The study evaluated the performance of these classifiers using various performance evaluation metrics. The Five machine learning algorithms were applied to the dataset. Applying the models on the dataset, we have got the highest accuracy with Naive Bayes, Decision Tree, and Extreme Gradient Boost. The accuracy was 98.75% for Extreme Gradient Boost and 97.50% for Decision Tree and Naive Bayes. 93.75% for Logistic Regression and Support Vector Machine. Logistic Regression and Support Vector Machine produced the lowest performance compared to Extreme Gradient Boost. Extreme Gradient Boost also produced the highest f1_score values. The proposed method was found to be effective in improving early diagnosis and treatment of renal failure, leading to better patient outcomes and reduced complications. The research provided valuable insights on the most suitable technique for early diagnosis and treatment of renal failure, and the results of this study may serve as a basis for further research in this field.
[1] Gunarathne, W. H. S. D., Perera, K. D. M., & Kahandawaarachchi, K. A. D. C. P. (2017, October). Performance evaluation on machine learning classification techniques for disease classification and forecasting through data analytics for chronic kidney disease (CKD). In 2017 IEEE 17th international conference on bioinformatics and bioengineering (BIBE) (pp. 291-296). IEEE. [2] Arasu, S. D., & Thirumalaiselvi, R. (2017). Review of chronic kidney disease based on data mining techniques. International Journal of Applied Engineering Research, 12(23), 13498-13505. [3] Salekin, A., & Stankovic, J. (2016, October). Detection of chronic kidney disease and selecting important predictive attributes. In 2016 IEEE International Conference on Healthcare Informatics (ICHI) (pp. 262-270). IEEE. [4] Yildirim, P. (2017, July). Chronic kidney disease prediction on imbalanced data by multilayer perceptron: Chronic kidney disease prediction. In 2017 IEEE 41st annual computer software and applications conference (COMPSAC) (Vol. 2, pp. 193-198). IEEE. [5] Snegha, J., Tharani, V., Preetha, S. D., Charanya, R., & Bhavani, S. (2020, February). Chronic kidney disease prediction using data mining. In 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE) (pp. 1-5). IEEE. [6] Kaur, G., & Sharma, A. (2017, November). Predict chronic kidney disease using data mining algorithms in hadoop. In 2017 international conference on inventive computing and informatics (ICICI) (pp. 973-979). IEEE. [7] Qin, J., Chen, L., Liu, Y., Liu, C., Feng, C., & Chen, B. (2019). A machine learning methodology for diagnosing chronic kidney disease. IEEE Access, 8, 20991-21002. [8] Vásquez-Morales, G. R., Martinez-Monterrubio, S. M., Moreno-Ger, P., & Recio-Garcia, J. A. (2019). Explainable prediction of chronic renal disease in the colombian population using neural networks and case-based reasoning. Ieee Access, 7, 152900-152910. [9] Chen, Z., Zhang, X., & Zhang, Z. (2016). Clinical risk assessment of patients with chronic kidney disease by using clinical data and multivariate models. International urology and nephrology, 48, 2069-2075. [10] Padmanaban, K. A., & Parthiban, G. (2016). Applying machine learning techniques for predicting the risk of chronic kidney disease. Indian Journal of Science and Technology, 9(29), 1-6. [11] De Almeida, K. L., Lessa, L., Peixoto, A., Gomes, R., & Celestino, J. (2020, January). Kidney failure detection using machine learning techniques. In 8th international workshop on advances in ICT infrastructures and services (ADVANCE 2020) (pp. 1-8). [12] Deepika, B., Rao, V. K. R., Rampure, D. N., Prajwal, P., & Gowda, D. G. (2020). Early prediction of chronic kidney disease by using machine learning techniques. Amer. J. Comput. Sci. Eng. The survey, 8(2), 7. [13] Shirahatti, A., Yadav, V., Vadde, N., Singh, A., Mahajan, P., & Sheikh, R. PREDICTION AND PREVENTIVE AWARENESS OF CHRONIC KIDNEY DISEASE USING MACHINE LEARNING ALGORITHMS. [14] Raghavan, S. R., Ladik, V., & Meyer, K. B. (2005). Developing decision support for dialysis treatment of chronic kidney failure. IEEE Transactions on Information Technology in Biomedicine, 9(2), 229-238. [15] Aprilianto, D. (2020). SVM optimization with correlation feature selection based binary particle swarm optimization for diagnosis of chronic kidney disease. Journal of Soft Computing Exploration, 1(1), 24-31. [16] Abinaya, U., Devi, S. A., Haritha, B., & Raghunathan, T. (2021, May). Noval approach for chronic kidney disease using machine learning methodology. In Journal of Physics: Conference Series (Vol. 1916, No. 1, p. 012164). IOP Publishing.
Copyright © 2023 Prawin R P, Pranav R P, Swathi R. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET54343
Publish Date : 2023-06-22
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here